27 research outputs found
Batch Policy Learning under Constraints
When learning policies for real-world domains, two important questions arise:
(i) how to efficiently use pre-collected off-policy, non-optimal behavior data;
and (ii) how to mediate among different competing objectives and constraints.
We thus study the problem of batch policy learning under multiple constraints,
and offer a systematic solution. We first propose a flexible meta-algorithm
that admits any batch reinforcement learning and online learning procedure as
subroutines. We then present a specific algorithmic instantiation and provide
performance guarantees for the main objective and all constraints. To certify
constraint satisfaction, we propose a new and simple method for off-policy
policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves
strong empirical results in different domains, including in a challenging
problem of simulated car driving subject to multiple constraints such as lane
keeping and smooth driving. We also show experimentally that our OPE method
outperforms other popular OPE techniques on a standalone basis, especially in a
high-dimensional setting
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning
Off-policy policy evaluation (OPE) is the problem of estimating the online performance of a policy using only pre-collected historical data generated by another policy. Given the increasing interest in deploying learning-based methods for safety-critical applications, many recent OPE methods have recently been proposed. Due to disparate experimental conditions from recent literature, the relative performance of current OPE methods is not well understood. In this work, we present the first comprehensive empirical analysis of a broad suite of OPE methods. Based on thousands of experiments and detailed empirical analyses, we offer a summarized set of guidelines for effectively using OPE in practice, and suggest directions for future research
One-loop corrections to the metastable vacuum decay
We evaluate the one-loop prefactor in the false vacuum decay rate in a theory
of a self interacting scalar field in 3+1 dimensions. We use a numerical
method, established some time ago, which is based on a well-known theorem on
functional determinants. The proper handling of zero modes and of
renormalization is discussed. The numerical results in particular show that
quantum corrections become smaller away from the thin-wall case. In the
thin-wall limit the numerical results are found to join into those obtained by
a gradient expansion.Comment: 31 pages, 7 figure
Biomechanical considerations in the pathogenesis of osteoarthritis of the knee
Osteoarthritis is the most common joint disease and a major cause of disability. The knee is the large joint most affected. While chronological age is the single most important risk factor of osteoarthritis, the pathogenesis of knee osteoarthritis in the young patient is predominantly related to an unfavorable biomechanical environment at the joint. This results in mechanical demand that exceeds the ability of a joint to repair and maintain itself, predisposing the articular cartilage to premature degeneration. This review examines the available basic science, preclinical and clinical evidence regarding several such unfavorable biomechanical conditions about the knee: malalignment, loss of meniscal tissue, cartilage defects and joint instability or laxity
Measurements of ψ(2S) and X(3872) → J/ψπ+π− production in pp collisions at √s=8 TeV with the ATLAS detector
Differential cross sections are presented for the prompt and non-prompt production of the hidden-charm states X(3872) and ψ(2S), in the decay mode J/ψπ+π−, measured using 11.4 fb−1 of pp collisions at √s=8 TeV by the ATLAS detector at the LHC. The ratio of cross-sections X(3872)/ψ(2S) is also given, separately for prompt and non-prompt components, as well as the non-prompt fractions of X(3872) and ψ(2S). Assuming independent single effective lifetimes for non-prompt X(3872) and ψ(2S) production gives RB=B(B→X(3872)+any)B(X(3872)→J/ψπ+π−)B(B→ψ(2S)+any)B(ψ(2S)→J/ψπ+π−)=(3.95±0.32(stat)±0.08(sys))×10−2RB=B(B→X(3872)+any)B(X(3872)→J/ψπ+π−)B(B→ψ(2S)+any)B(ψ(2S)→J/ψπ+π−)=(3.95±0.32(stat)±0.08(sys))×10−2 separating short- and long-lived contributions, assuming that the short-lived component is due to Bc decays, gives RB = (3.57 ± 0.33(stat) ± 0.11(sys)) × 10−2, with the fraction of non-prompt X(3872) produced via Bc decays for pT(X(3872)) > 10 GeV being (25 ± 13(stat) ± 2(sys) ± 5(spin))%. The distributions of the dipion invariant mass in the X(3872) and ψ(2S) decays are also measured and compared to theoretical predictions